56 research outputs found
Learning the Joint Representation of Heterogeneous Temporal Events for Clinical Endpoint Prediction
The availability of a large amount of electronic health records (EHR)
provides huge opportunities to improve health care service by mining these
data. One important application is clinical endpoint prediction, which aims to
predict whether a disease, a symptom or an abnormal lab test will happen in the
future according to patients' history records. This paper develops deep
learning techniques for clinical endpoint prediction, which are effective in
many practical applications. However, the problem is very challenging since
patients' history records contain multiple heterogeneous temporal events such
as lab tests, diagnosis, and drug administrations. The visiting patterns of
different types of events vary significantly, and there exist complex nonlinear
relationships between different events. In this paper, we propose a novel model
for learning the joint representation of heterogeneous temporal events. The
model adds a new gate to control the visiting rates of different events which
effectively models the irregular patterns of different events and their
nonlinear correlations. Experiment results with real-world clinical data on the
tasks of predicting death and abnormal lab tests prove the effectiveness of our
proposed approach over competitive baselines.Comment: 8 pages, this paper has been accepted by AAAI 201
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion
The task of completing knowledge triplets has broad downstream applications.
Both structural and semantic information plays an important role in knowledge
graph completion. Unlike previous approaches that rely on either the structures
or semantics of the knowledge graphs, we propose to jointly embed the semantics
in the natural language description of the knowledge triplets with their
structure information. Our method embeds knowledge graphs for the completion
task via fine-tuning pre-trained language models with respect to a
probabilistic structured loss, where the forward pass of the language models
captures semantics and the loss reconstructs structures. Our extensive
experiments on a variety of knowledge graph benchmarks have demonstrated the
state-of-the-art performance of our method. We also show that our method can
significantly improve the performance in a low-resource regime, thanks to the
better use of semantics. The code and datasets are available at
https://github.com/pkusjh/LASS.Comment: COLING 202
FIMO: A Challenge Formal Dataset for Automated Theorem Proving
We present FIMO, an innovative dataset comprising formal mathematical problem
statements sourced from the International Mathematical Olympiad (IMO)
Shortlisted Problems. Designed to facilitate advanced automated theorem proving
at the IMO level, FIMO is currently tailored for the Lean formal language. It
comprises 149 formal problem statements, accompanied by both informal problem
descriptions and their corresponding LaTeX-based informal proofs. Through
initial experiments involving GPT-4, our findings underscore the existing
limitations in current methodologies, indicating a substantial journey ahead
before achieving satisfactory IMO-level automated theorem proving outcomes
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models
Automated theorem proving (ATP) has become an appealing domain for exploring
the reasoning ability of the recent successful generative language models.
However, current ATP benchmarks mainly focus on symbolic inference, but rarely
involve the understanding of complex number combination reasoning. In this
work, we propose TRIGO, an ATP benchmark that not only requires a model to
reduce a trigonometric expression with step-by-step proofs but also evaluates a
generative LM's reasoning ability on formulas and its capability to manipulate,
group, and factor number terms. We gather trigonometric expressions and their
reduced forms from the web, annotate the simplification process manually, and
translate it into the Lean formal language system. We then automatically
generate additional examples from the annotated samples to expand the dataset.
Furthermore, we develop an automatic generator based on Lean-Gym to create
dataset splits of varying difficulties and distributions in order to thoroughly
analyze the model's generalization ability. Our extensive experiments show our
proposed TRIGO poses a new challenge for advanced generative LM's including
GPT-4 which is pre-trained on a considerable amount of open-source formal
theorem-proving language data, and provide a new tool to study the generative
LM's ability on both formal and mathematical reasoning.Comment: Accepted by EMNLP 2023. Code is available at
https://github.com/menik1126/TRIG
A Comprehensive Survey on Deep Graph Representation Learning
Graph representation learning aims to effectively encode high-dimensional
sparse graph-structured data into low-dimensional dense vectors, which is a
fundamental task that has been widely studied in a range of fields, including
machine learning and data mining. Classic graph embedding methods follow the
basic idea that the embedding vectors of interconnected nodes in the graph can
still maintain a relatively close distance, thereby preserving the structural
information between the nodes in the graph. However, this is sub-optimal due
to: (i) traditional methods have limited model capacity which limits the
learning performance; (ii) existing techniques typically rely on unsupervised
learning strategies and fail to couple with the latest learning paradigms;
(iii) representation learning and downstream tasks are dependent on each other
which should be jointly enhanced. With the remarkable success of deep learning,
deep graph representation learning has shown great potential and advantages
over shallow (traditional) methods, there exist a large number of deep graph
representation learning techniques have been proposed in the past decade,
especially graph neural networks. In this survey, we conduct a comprehensive
survey on current deep graph representation learning algorithms by proposing a
new taxonomy of existing state-of-the-art literature. Specifically, we
systematically summarize the essential components of graph representation
learning and categorize existing approaches by the ways of graph neural network
architectures and the most recent advanced learning paradigms. Moreover, this
survey also provides the practical and promising applications of deep graph
representation learning. Last but not least, we state new perspectives and
suggest challenging directions which deserve further investigations in the
future
Semaphorin 3A Contributes to Secondary Blood–Brain Barrier Damage After Traumatic Brain Injury
Semaphorin 3A (SEMA3A) is a member of the Semaphorins family, a class of membrane-associated protein that participates in the construction of nerve networks. SEMA3A has been reported to affect vascular permeability previously, but its influence in traumatic brain injury (TBI) is still unknown. To investigate the effects of SEMA3A, we used a mouse TBI model with a controlled cortical impact (CCI) device and a blood–brain barrier (BBB) injury model in vitro with oxygen-glucose deprivation (OGD). We tested post-TBI changes in SEMA3A, and its related receptors (Nrp-1 and plexin-A1) expression and distribution through western blotting and double-immunofluorescence staining, respectively. Neurological outcomes were evaluated by modified neurological severity scores (mNSSs) and beam-walking test. We examined BBB damage through Evans Blue dye extravasation, brain water content, and western blotting for VE-cadherin and p-VE-cadherin in vivo, and we examined the endothelial cell barrier through hopping probe ion conductance microscopy (HPICM), transwell leakage, and western blotting for VE-cadherin and p-VE-cadherin in vitro. Changes in miR-30b-5p were assessed by RT-PCR. Finally, the neuroprotective function of miR-30b-5p is measured by brain water content, mNSSs and beam-walking test. SEMA3A expression varied following TBI and peaked on the third day which expressed approximate fourfold increase compared with sham group, with the protein concentrated at the lesion boundary. SEMA3A contributed to neurological function deficits and secondary BBB damage in vivo. Our results demonstrated that SEMA3A level following OGD injury almost doubled than control group, and the negative effects of OGD injury can be improved by blocking SEMA3A expression. Furthermore, the expression of miR-30b-5p decreased approximate 40% at the third day and 60% at the seventh day post-CCI. OGD injury also exhibited an effect to approximately decrease 50% of miR-30b-5p expression. Additionally, the expression of SEMA3A post-TBI is regulated by miR-30b-5p, and miR-30b-5p could improve neurological outcomes post-TBI efficiently. Our results demonstrate that SEMA3A is a significant factor in secondary BBB damage after TBI and can be abolished by miR-30b-5p, which represents a potential therapeutic target
Concatenated Deep-Learning Framework for Multitask Change Detection of Optical and SAR Images
Optical and synthetic aperture radar (SAR) images provide complementary information to each other. However, the heterogeneity of same-ground objects brings a large difficulty to change detection (CD). Correspondingly, transformation-based methods are developed with two independent tasks of image translation and CD. Most methods only utilize deep learning for image translation, and the simple cluster and threshold segmentation leads to poor CD results. Recently, a deep translation-based CD network (DTCDN) was proposed to apply deep learning for image translation and CD to improve the results. However, DTCDN requires the sequential training of the two independent subnetwork structures with a high computational cost. Toward this end, a concatenated deep-learning framework, multitask change detection network (MTCDN), of optical and SAR images is proposed by integrating the CD network into a complete generative adversarial network. This framework contains two generators and discriminators for optical and SAR image domains. Multitask refers to the combination of image identification by discriminators and CD based on an improved UNet++. The generators are responsible for image translation to unify the two images into the same feature domain. In the training and prediction stages, an end-to-end framework is realized to reduce cost. The experimental results on four optical and SAR datasets prove the effectiveness and robustness of the proposed framework over eight baselines
- …